Spectral Signal Processing for ASR
نویسنده
چکیده
The paper begins by discussing the difficulties in obtaining repeatable results in speech recognition. Theoretical arguments are presented for and against copying human auditory properties in automatic speech recognition. The “standard” acoustic analysis for automatic speech recognition, consisting of melscale cepstrum coefficients and their temporal derivatives, is described. Some variations and extensions of the standard analysis — PLP, cepstrum correlation methods, LDA, and variants on log power — are then discussed. These techniques pass the test of having been found useful at multiple sites, especially with noisy speech. The extent to which auditory properties can account for the advantage found for particular techniques is considered. It is concluded that the advantages do not in fact stem from auditory properties, and that there is so far little or no evidence that the study of the human auditory system has contributed to advances in automatic speech recognition. Contributions in the future are not, however, ruled out.
منابع مشابه
A comparison of front-ends for bitstream-based ASR over IP
Automatic Speech Recognition (ASR) is called to play a relevant role in the provision of spoken interfaces for IP-based applications. However, as a consequence of the transit of the speech signal over these particular networks, ASR systems need to face two new challenges: the impoverishment of the speech quality due to the compression needed to fit the channel capacity and the inevitable occurr...
متن کاملPolarizing beam splitters constructed of form-birefringent multilayer gratings
We introduce a novel polarizing beam splitter that uses the anisotropic spectral reflectivity (ASR) characteristics of a high spatial frequency multilayer binary grating. By combining the form birefringence effect of a high spatial frequency grating with the resonant reflectivity of a periodic multilayer structure, the ASR characteristics for the two orthogonal linear polarizations are obtained...
متن کاملPhase-Aware Signal Processing for Automatic Speech Recognition
Conventional automatic speech recognition (ASR) often neglects the spectral phase information in its front-end and feature extraction stages. The aim of this paper is to show the impact that enhancement of the noisy spectral phase has on ASR accuracy when dealing with speech signals corrupted with additive noise. Apart from proof-of-concept experiments using clean spectral phase, we also presen...
متن کاملIdentifying the human-machine differences in complex binaural scenes: what can be learned from our auditory system
Previous comparisons of human speech recognition (HSR) and automatic speech recognition (ASR) focused on monaural signals in additive noise, and showed that HSR is far more robust against intrinsic and extrinsic sources of variation than conventional ASR. The aim of this study is to analyze the man-machine gap (and its causes) in more complex acoustic scenarios, particularly in scenes with two ...
متن کاملAutomatic speech recognition with primarily temporal envelope information
The aim of this study is to devise a computational method to predict cochlear implant (CI) speech recognition. Here, we describe a high-throughput screening system for optimizing CI speech processing strategies using hidden Markov model (HMM)-based automatic speech recognition (ASR). Word accuracy was computed on vocoded CI speech synthesized from primarily multi-channel temporal envelope infor...
متن کاملA Novel Sampling Approach in GNSS-RO Receivers with Open Loop Tracking Method
Propagation of radio occultation (RO) signals through the lower troposphere results in high phase acceleration and low signal to noise ratio signal. The excess Doppler estimation accuracy in lower troposphere is very important in receiving RO signals which can be estimated by sliding window spectral analysis. To do this, various frequency estimation methods such as MUSIC and ESPRIT can be adopt...
متن کامل